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PsJ ■ Abstract 

CN! . . 

We introduce the notion of restricted sensitivity as an alternative to global and smooth 

sensitivity to improve accuracy in differentially private data analysis. The definition of restricted 
Ph ! sensitivity is similar to that of global sensitivity except that instead of quantifying over all 

■ possible datasets, we take advantage of any beliefs about the dataset that a querier may have, 
tyj , to quantify over a restricted class of datasets. Specifically, given a query / and a hypothesis H 
O ■ about the structure of a dataset D, we show generically how to transform / into a new query 

f-H whose global sensitivity (over all datasets including those that do not satisfy H) matches 
the restricted sensitivity of the query /. Moreover, if the belief of the querier is correct (i.e., 
^1 D then fn{D) = f{D). If the belief is incorrect, then fn{D) may be inaccurate. 

We demonstrate the usefulness of this notion by considering the task of answering queries 
regarding social-networks, which we model as a combination of a graph and a labeling of its 

■ vertices. In particular, while our generic procedure is computationally inefficient, for the specific 
definition of H as graphs of bounded degree, we exhibit efficient ways of constructing f-u using 

■ different projection-based techniques. We then analyze two important query classes: subgraph 
, counting queries (e.g., number of triangles) and local profile queries (e.g., number of people who 

know a spy and a computer-scientist who know each other). We demonstrate that the restricted 
sensitivity of such queries can be significantly lower than their smooth sensitivity. Thus, using 
restricted sensitivity we can maintain privacy whether or not D € Ji, while providing more 
K> " accurate results in the event that H holds true. 
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1 Introduction 



The social networks we inhabit have grown significantly in recent decades with digital technology 
enabling the rise of networks like Facebook that now connect over 900 million people and house 
vast repositories of personal information. At the same time, the study of various characteristics of 
social networks has emerged as an active research area [10]. Yet the fact that the data in a social 
network might be used to infer sensitive details about an individual, like sexual orientation |15j . is a 
growing concern among social networks' participants. Even in an 'anonymized' unlabeled graph it 
is possible to identify people based on graph structures [3]. In this paper, we study the feasibility of 
and design efficient algorithms to release statistics about social networks (modeled as graphs with 
vertices labeled with attributes) while satisfying the semantic definition of differential privacy [8]l9] . 

A differentially private mechanism guarantees that any two neighboring data sets (i.e., data 
sets that differ only on the information about a single individual) induce similar distributions over 
the statistics released. For social networks, we consider two notions of neighboring or adjacent 
networks: (1) edge adjacency stipulating that adjacent graphs differ in just one edge or in the 
attributes of just one vertex; and (2) vertex adjacency stipulating that adjacent networks differ on 
just one vertex — its attributes or any number of edges incident to it. 

For any given statistic or query, its global sensitivity measures the maximum difference in the 
answer to that query over all pairs of neighboring data sets |9j ; global sensitivity provides an upper 
bound on the amount of noise that has to be added to the actual statistic in order to preserve 
differential privacy. Since the global sensitivity of certain types of queries can be quite high, the 
notion of smooth sensitivity was introduced to reduce the amount of noise that needs to be added 
while still preserving differential privacy |18j . 

However, a key challenge in the differentially private analysis of social networks is that for many 
natural queries, both global and smooth sensitivity can be very large. In the vertex adjacency 
model, consider the query "How many people in Gi are a doctor or are friends with a doctor?" 
Even if the answer is (e.g., there are no doctors in the social network) there is a neighboring 
social network G2 in which the answer is n (e.g., pick an arbitrary person from Gi, relabel him 
as a doctor, and connect him to everyone). Even in the edge adjacency model, the sensitivity of 
queries may be high. Consider the query "How many people in Gi are friends with two doctors 
who are also friends with each other?" In Gi the answer may be even if there are two doctors 
that everyone else is friends with (e.g, the doctors are not friends with each other), but the answer 
jumps to n — 2 in a neighboring graph G2 (e.g, if we simply connect the doctors to each other). In 
fact, even the first query can have high sensitivity in the edge- adjacency model if we just relabel a 
high-degree vertex as a doctor. 

Yet, while these examples respect the mathematical definitions of neighboring graphs and net- 
works, we note that in a real social network no single individual is likely to be directly connected 
with everyone else. Suppose that in fact a querier has some such belief T-L about the given network 
(Ti is a subset of all possible networks) such that its query / has low sensitivity restricted only 
to inputs and deviations within T-L. For example, the querier may believe the following hypothesis 
{Tik)'- the maximum degree of any node in the network is at most k = 5000 ^ n ~ 9 x 10^ (e.g, 
after reading a study on the anatomy of Facebook |21]). Can one in that case provide accurate 
answers in the event that indeed G £ T-L and yet preserve privacy no matter what (even if T-L is not 
satisfied)? 

In this work, we provide a positive answer to this question. We do so by introducing the notion 
of restricted sensitivity, which represents the sensitivity of the query / over only the given subset 
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Table 1: Summary of Results. GS = global sensitivity, RS = restricted sensitivity, and S = smooth 
bound of local sensitivity. 
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Table 2: Worst Case Smooth Sensitivity over T-Lk vs. Restricted Sensitivity RSj {H-k)- 

Ti, and providing procedures that map a query / to an alternative query /-^ s.t. / and /-^ identify 
over the inputs in Ti, yet the global sensitivity of /-^ is comparable to just the restricted sensitivity 
of /. Therefore, the mechanism that answers according to and adds Laplace random noise 
preserves privacy for all inputs, while giving good estimations of / for inputs in Ti. 

While our general scheme for devising such /-^ is inefficient and requires that we construct 
a separate /-^ for each query /, we also design a complementary projection-based approach. A 
projection of ^ is a function mapping all possible inputs (e.g., all possible n-node social networks) 
to inputs in Ti with the property that any input in Ti is mapped to itself. Therefore, a projection 
/i allows us to define /-^ for any /, simply by composing fu = f ° f^- Moreover, if this projection 
/i satisfies certain smoothness properties, which we define in Section U then this function /-^ will 
have its global sensitivity — or at least its smooth sensitivity over inputs in T-L — comparable to only 
the restricted sensitivity of /. In particular, for the case 7i = Hk (the assumption that the network 
has degree at most k <C n), we show we can efficiently construct projections // satisfying these 
conditions, therefore allowing us to efficiently take advantage of low restricted sensitivity. These 
results are given in Section [4] and summarized in Tabled! 

The next natural question is: how much advantage does restricted sensitivity provide, compared 
to global or smooth sensitivity, for natural query classes and natural sets T-L7 In Section[5]we consider 
two natural classes of queries: local profile queries and subgraph counting queries. A local profile 
query asks how many nodes v in a, graph satisfy a property which depends only on the immediate 
neighborhood of v (e.g, queries relating to clustering coefficients and bridges [10], or queries such 
as "how many people know two spies who don't know each other?"). A subgraph counting query 
asks how many copies of a particular subgraph P are contained in the network (e.g., number of 
triangles involving at least one spy). For the case T-L = T-L]^ for k <^ n we show that the restricted 
sensitivity of these classes of queries can indeed be much lower than the smooth sensitivity. These 
results, presented in Section [5l are summarized in Table [2l 

1.1 Related Work 

Easley and Kleinberg provide an excellent summary of the rich literature on social networks |10j . 
Previous literature on differentially-private analysis of social networks has primarily focused on 
the edge adjacency model in unlabeled graphs where sensitivity is manageable. Triangle count- 
ing queries can be answered in the edge adjacency model by efficiently computing the smooth 
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sensitivity [18], and this result can be extended to answer other counting queries |16] . |14| shows 
how to privately approximate the degree distribution in the edge adjacency model. The Johnson- 
Lindenstrauss transform can be used to answer all cut queries in the edge adjacency model [5]. 

The approach taken in the work of Rastogi et al. [19] on answering subgraph counting queries is 
the most similar to ours. They consider a bayesian adversary whose prior (background knowledge) 
is drawn from a distribution. Leveraging an assumption about the adversary's prior they compute 
a high probability upper bound on the local sensitivity of the data and then answer by adding noise 
proportional to that bound. Loosely, they assume that the presence of an edge does not presence 
of other edges more likely. In the specific context of a social network this assumption is widely 
believed to be false (e.g., two people are more likely to become friends if they already have common 
friends [lO]). The privacy guarantees of [19] only hold if these assumptions about the adversaries 
prior are true. By contrast, we always guarantee privacy even if the assumptions are incorrect. 

A relevant approach that deals with preserving differential privacy while providing better utility 
guarantees for nice instances is detailed in the work of Nissim et al [18] who define the notion of 
smooth sensitivity. In their framework, the amount of random noise that the mechanism adds to 
a query's true ansewr is dependent on the extent for which the input database is "nice" - having 
small local sensitivity. As we discuss later, in social networks many natural queries (e.g., local 
profile queries) even have high local and smooth sensitivity. 

2 Preliminaries 

2.1 Differential Privacy 

We adopt the framework of differential privacy. We use D to denote the set of all possible datasets. 

Intuitively, we say two datasets D,D' € D are neighbors if they differ on the details of a single 

individual. (See further discussion in Definitions [6] and [71) We denote the fact that D' is a neighbor 

of D using D' ~ D. We define the distance d{D, D') between two databases D,D' G D as the 

minimal non-negative integer k s.t. there exists a path Do,Di, . . . , where Dq = D, Dj. = D' 

and for every 1 < i < A; we have that -Dj-i ~ Di. Given a subset T>' C T> we denote the distance 

of a database D to V as d(D, V) = min d{D, D'). 

D'eV 

Definition 1. JS^ A mechanism A is {€,5)-dijferentially private if for every pair of neighboring 
datasets D,D' £ D and every subset S C Range{A) we have that Vt:[A{D) G 5] < e'^Pr[A(D') E 
S]+5 . 

Intuitively differential privacy guarantees that an adversary has a very limited ability to dis- 
tinguish between the output of A (D) and the output of A {D'). A query is a function / : 2? — t- M 
mapping the dataset to a real number. 

Definition 2. The local sensitivity of a query f at a dataset D is LSf{D) = ™ax \ f{D) — f{D')\. 

Definition 3. The global sensitivity of a query f is GSf = uiaxD^x) LS j (D) . 

The Laplace mechanism A{D) = f (D) + Lap {GSf /e) preseves (e, 0)-differential privacy [9]. 
This mechanism provides useful answers to queries with low global sensitivity. The primary chal- 
lenge in the differentially private analysis of social networks is the high global sensitivity of many 
queries. The local sensitivity LSf{D) may be significantly lower than the global sensitivity GSj. 
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However, adding noise proportional to LSf{D) does not preserve differential privacy because the 
noise level itself may leak information. A clever way to circumvent this problem is to smooth out 
the noise level [T8] . 

Definition 4. \18^ A (3-smooth upper bound on the local sensitivity of a query f is a function 
Sf 13 which satisfies (i) VD G V^Sj p (D) > LSj (D), and (ii) ^D, D' £ V it holds that Sf p (D) < 
exv{-pd{D,D'))Sf,f,{D'). 

It is possible to preserve privacy while adding noise proportional to a /3-smooth upper bound 

(25" ( D) \ 
— — - j , with 

/3 = — e/21n5 preserves (e, (5)-differential privacy [18]. To evaluate A efficiently one must present 
an algorithm to efficiently compute the /3-smooth upperbound Sf^fs (G), a task which is by itself 
often non-trivial. 

2.2 Graphs and Social Networks 

Our work is motivated by the challenges posed by differentially private analysis of social networks. 
As always, a graph is a pair of a set of vertices and a set of edges G = {V, E). We often just denote 
a graph as G, referring to its vertex-set or edge-set as V{G) or E{G) resp. A key aspect of our 
work is modeling a social network as a labeled graph. 

Definition 5. A social network {G,i) is a graph with labeling function i : V{G) — t- W^. The set 
of all social networks is denoted Q. 

The labeling function (. allows us to encode information about the nodes (e.g., age, gender, 
occupation). For convenience, we assume all social networks are over the same set of vertices, 
which is denotes as V and has size n, and so we assume |y| = n is public knowledge Therefore, 
the graph structures of two social networks are equal if their edge-sets are identical. Similarly, we 
also fix the dimension m of our labeling. 

Defining differential privacy over the labeled graphs Q requires care. What does it mean for 
two labeled graphs Gi,G2 G G to be neighbors? There are two natural notions: edge- adjacency 
and vertex adjacency. 

Definition 6 (Edge- adjacency). We say that two social networks {Gi,ii) and (G2, ^2) are neighbors 
if either (i) E{Gi) = E{G2) and there exists a vertex u such that i{u) ^ i{v) whereas for every other 
V ^ u we have ii (v) = £2 (v) or (ii) \/v,ii{v) = £2 (v) and the symmetric difference E{Gi) A E{G2) 
contains a single edge. 

In the context of a social network, differential-privacy w.r.t edge- adjacency can, for instance, 
guarantee that an adversary will not be able to distinguish whether a particular individual has 
friended some specific pop-singer on Facebook. However, such guarantees do not allow a person 
to pretend to listen only to high-end indie rock bands, should that person have friended numerous 
pop-singers on Facebook. This motivates the stronger vertex- adjacency neighborhood model. 

Definition 7 (Vertex-adjacency). We say that two social networks {Gi,ii) and {G2,ii) are neigh- 
bors if there exists a vertex Vi such that Gi — Vi = G2 — Vi and ii{vj) = ^2{vj) for every Vj ^ Vi. 

^Adding or removing vertices could be done by adding one more dimension to the labeling, indicating whether a 
node is active or inactive. 
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where for a graph G and a vertex v we denote G — v as the result of removing every edge in E{G) 
that touches v. 

It is evident that any two social networks that are edge-adjacent are also vertex-adjacent. 
Preserving differential privacy while guaranteeing good utility bounds w.r.t vertex-adjacency is a 
much harder task than w.r.t edge-adjacency. 



Distance. Given two social networks (Gi, £i) and (G2, ^2)1 recall that their distance is the minimal 
k s.t. one can form a path of length k, starting with and ending at (^2,^2)1 with the 

property that every two consecutive social-networks on this path are adjacent. Given the above 
two definitions of adjacency, we would like to give an alternative characterization of this distance. 

First of all, the set U = {v : ii{v) 7^ ^2{v)} dictates \U\ steps that we must take in order to 
transition from (Gi,^i) to (G2,^2)- It is left to determine how many adjacent social-networks we 
need to transition through until we have E{Gi) = E[G2)- To that end, we construct the difference- 
graph whose edges are the symmetric difference of E{Gi) and E[G2)- Clearly, to transition from 
{Gi,ii) to (02,^2)5 we need to alter every edge in the difference graph. In the edge- adjacency 
model, a pair of adjacent social networks covers precisely a single edge, and so it is clear that 
the distance (i((Gi, ^i), (G2, ^2)) = \U\ + \E{Gi) A E{G2)\- In the vertex-adjacency model, a 
single vertex can cover all the edges that touch it, and so the distance between the graphs Gi — U 
and G2 — U is precisely the vertex cover of the difference graph. Denoting this vertex cover as 
VG{Gi - [/ A G2 - [/) we have that d{{Gi,h), {G2J2)) = \U\ + \VG{Gi - U A G2 - U)\. It is 
evident that computing the distance of between any two social- networks in the vertex- adjacency 
model is a NP-hard problem. 

To avoid cumbersome notation, from this point on we omit the differentiation between graphs 
and social networks, and denote networks as graphs G £ G. 



3 Restricted Sensitivity 

We now introduce the notion of restricted sensitivity, using a hypothesis about the dataset D to 
restrict the sensitivity of a query. A hypothesis "H is a subset of the set T> of all possible datasets 
(so in the context of social networks, ^ is a set of labeled graphs). We say that Ti is true if the 
true dataset D £ Ti. Because the hypothesis 7i may not be a convex set we must consider all pairs 
of datasets in T-L instead of all pairs of adjacent datasets as in the definition of global sensitivity. 

Definition 8. For a given notion of adjacency among datasets, the restricted sensitivity of f over 
a hypothesis Ti CD is 

, Af{Di)-fiD2)\ . 
Rbf{H)= max ( — — — — 1 . 

■' Di,D2eH ^ d{Di,D2) ' 

To be clear, d{Di,D2) denotes the length of the shortest-path in T> between Di and D2 (not 
restricting the path to only use D G T-L) using the given notion of adjacency (e.g., edge- adjacency 
or vertex-adjacency). That is, we restrict the set of databases for which we compute the sensitivity, 
but we do not re-define the distances. 

Observe that RSf (Ti) may be smaller than LSf (D) for some D £ Ti if D has a neighbor D' ^ T-L. 
In fact we often have LSj (D) > \ f (D) — f {D')\ ^ RSf {Ti). As an immediate corollary, in such 
cases RSf {%) will be significantly lower than 5/^^ (D), a /3-smooth upper bound on LSf (D). 
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4 Using Restricted Sensitivity to Reduce Noise 



To achieve differential privacy while adding noise proportional to RSf {%) we must be willing to 
sacrifice accuracy guarantees for datasets D ^ Ti. Our goal is to create a new query /-^ such that 
fn{D) = f{D) for every D & V (/-^ is accurate when the hypothesis is correct) and either has 
low global sensitivity or low /3-smooth sensitivity over datasets D € T-L. In this section, we first 
give a non-efficient generic construction of such J-^, showing that it is always possible to devise /-^ 
whose global sensitivity equals exactly the restricted sensitivity of / over H. We then show how 
for the case of social networks and for the hypothesis T-Lk that the network has bounded degree, we 
can construct functions fy^^ having approximately this property, efficiently. 

4.1 A General Construction 

We now show how given H to generically (but not efficiently) construct /-^ whose global sensitivity 
exactly equals the restricted sensitivity of / over T-L. 

Theorem 9. Given any query f and any hypothesis % <ZT) we can construct a query such that 

1. ^Den it holds that fy (D) = f (D), and 2. GSf^ = RSf {%) 

Proof. For each D & Ti set f-y (D) = f (D). Now fix an arbitrary ordering of the set {D : D ^ H}, 
and denote its elements as Di,D2, ■ ■ ■ ,Dm, where m is the size of the set. For every D ^ T-L we 
define the value of /^(Z?) inductively. Denote 71 = T-L{J{Di, Di}. Initially, we are given the 
values of every D G To- Given i > 0, we denote Aj = RSf^ {%). We now prove one can pick the 
value f'n{Di) in a way that preserves the invariant that Aj+i = Aj. By applying the induction m 
times we conclude that 

RSf {%) = Ao = A„, = RSf^ (V) = GSf^ . 

Fix i > 0. Observe that Aj+i = max ^Aj, ^max/jgy- ^d^D^z^lf)'^^ ^ ^ ) ) ' *° preserve the 
invariant it suffices to find any value of /-^ (L'j+i) that satisfies that for every D £ 7i we have 
1/-^ (Z?) — (L'j+i)! < Ai {d {D, Di+i)). Suppose for contradiction that no value exists. Then 
there must be two intervals 

[fn (Dl) -Ai-d {Dl, A+i) , fu {DD + Ai-d (Di*, A+i)] 
[fn {DD -Ai-d (Z?2*, A+i) , fn (^2) + Ai • d (D^, A+i)] 

which don't intersect. This would imply that ^^^^'^jl~J^}{^''^^ > > \, which 

contradicts the fact that Aj is the restricted sensitivity of 71- □ 

4.2 Efficient Procedures for Tik via Projection Schemes 

Unfortunately, the construction of Theorem [9] is highly inefficient. Furthermore, this construction 
deals with one query at a time. We would like to a-priori have a way to efficiently devise /•^ for 
any /. In this section, the way we devise /-^ is by constructing a projection - a function fi -.V ^ 7i 
with the property that fi{D) = D for every D £ Ti. Such allows us to canonically convert any f 
into /-^ using the naive definition fu = f ° l^-- Below we discuss various properties of projections 
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that allow us to derive "good" f-H-s. Following each property, we exhibit the existence of such 
projections fi for the specific case of social networks and Ti = Tik, the class of graphs of degree at 
most k. 

Definition 10. The class Tif^ is defined as the set {G G Q : Vf, deg(f) < k}. 

In many labeled graphs, it is reasonable to believe that Tik holds for k <^n because the degree 
distributions follow a power law. For example, the number of telephone numbers receiving t calls 
in a day is proportional to and the number of web pages with t incoming links is proportional 
to [TllinillZ]- For these networks it would suffice to set A: = O (\/^)- The number of papers 
that receive t citations is proportional to 1/t^ so we could set k = O {^/n) [10]. While the degrees 
on Facebook don't seem to follow a power law, the upper bound k = 5, 000 seems reasonable |21j . 
By contrast, Facebook had approximately n = 901,000,000 users in June, 2012 (n > k^) [1]. 



4.2.1 Smooth Projection 

The first property we discuss is perhaps the simplest and most coveted property such projection can 
have - smoothness. Smoothness dictates that there exists a global bound on the distance between 
any two mappings of two neighboring databases. 

Definition 11. A projection fi : D ^ Ti is called c-smooth if for any two neighboring databases 
D ^ D' we have that d{iJ,{D), fi{D')) < c. 

Lemma 12. Let fj, : D ^ Ti be a c-smooth projection (i.e., for every D £ Ji we have ^l{D) = D). 
Then for every query f , the function fu = f°f^ satisfies that GSj^ < c ■ RSj {%) . 



Proof. 



GSf^ = max |/^ (Di) - (Ds)! = max |/ (/. pi)) - / (/z (Ds))! • 1 
< max |/(/i(I)i))-/(^p2))| 



\f{Di)-f{D 2y 

DuDi&V. d{Di,D2) 



<c-^niax^^ jy^^j — py^; = c-RSf[H) □ 



As we now show, for % = TL^ and for distances defined via the edge- adjacency model, we can 
devise an efficient smooth projection. 

Claim 13. In the edge-adjacency model, there exists an efficiently computable 3-smooth projection 
to Tik- 

The proof of the claim is deferred to the appendix. The high-level idea is to fix a canonical 
ordering over all edges and then define /i to delete an edge e if and only if there is a vertex v such 
that (1) e is incident to v and (2) e is not one of the first k edges incident to v. This is then used 
to achieve the smoothness guarantee. An immediate corollary of Lemma [12] and Claim [13] is the 
following theorem. 

Theorem 14. (Privacy wrt Edge Changes) Given any query for social networks f, the mechanism 
that uses the projection fi from Claim [T3[ and answers the query using A{f, G) = f{^{G)) -\- Lap{3- 
RS f^Hk) / e) preserves (e,0) privacy for any graph G. 
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Now, it is evident that this mechanism has the guarantee that for every G G Tik it holds that 
Pr[ \A{f,G) - f{G)\ < 0{RSf{nk)/e)] > 2/3. Furthermore, if the querier "lucked out" to ask a 
query / for which f{G) and f{n{G)) are close (say, identical), then the same guarantee holds for 
such G as well. Note however that we cannot reveal to the querier whether f(G) and f(iJ,{G)) are 
indeed close, as such information might leak privacy. 



4.2.2 Projections and Smooth Distances Estimators 

Unfortunately, the smooth projections do not always exist, as the following toy-example demon- 
strates. Fix n graphs, where d{Gi,Gj) = \i — j\ for 1 < i,j < n, and let Ti = Because 
li{Gi) = Gi and fi{Gn) = Gn, then there must exist some value i such that n{Gi) / 
thus every /i cannot be c-smooth for c < n. 

Note that smooth projections have the property that they also provide a c-approximation of 
the distance of D to %. Meaning, for every D we have that d{D,7i) < d{D,iJ,{D)) < c-d{D,T-L). In 
the vertex adjacency model, however, it is evident that we cannot have a 0(l)-smooth projection 
since, as we show in the appendix, it in NP-hard to approximate d{G, Hk) (see Claim [23l) . but there 
does exists an efficient approximation scheme (see Claim of the distance. Yet, we show that 
it is possible to devise a somewhat relaxed projection s.t. the distance between a database and 
its mapped image is a smooth function. To that end, we relax a little the definition of projection, 
allowing it to map instances to some predefined T-LZ) %. 

Definition 15. Fix H D H. Let ^ he a projection ofH, so is a mapping fj, : D ^ H that maps 
every element ofH to itself (iD ^H. we have that ^{D) = D). A c-smooth distance estimator is a 
function : 2? — )• M that satisfies all of the following. (1) For every D £ Ti it is defined as d^{D) = 
0. (2) It is lower hounded hy the distance of D to its projection: \/D € D, d^{D) > d{D,fi(D)). 
(3) Its value over neighhoring datahases changes hy at most c: \fD ~ D' , d^{D) — d^{D') < c. 

It is simple to verify that for every D G T> we have that d^{D) < c ■ d{D,T-L) (using induction 
of d{D,T-l)). We omit the subscript when fi is specified. 

The following lemma suggests that a smooth distance estimator allows us to devise a good 
smooth-upper bound on the local-sensitivity, thus allowing us to apply the smooth-sensitivity 
scheme of 1181. 



Lemma 16. Fix % % and let : T) ^ % he a projection of%. Let d : T> ^ M he an efficiently 
computahle c-smooth distance estimator. Then for every query f , we can define the composition 
f'H = f°f^ c-i^d define the function 

Sf^^p{D)= max e^^ {-^ (d - d{D)\) {2d + c + I) ■ RSf {%) 

deZ,d>d{D) ^ ^ 

Then Sf^^p is an efficiently computahle f3-smooth upper bound on the local sensitivity of f-}{. Fur- 

f2ie~^+^'' 0<x< — 
thermore, define g as the function g{x) = \ ^ > — — c+i ^ Then for every D it holds 

yc+l, x> ^ 

that 



c+l 



Sf^p [D) < exp(f d(Z?)) • g{l3/c)RSf{n) 

The proof of Lemma [TBI is deferred to the appendix. Like in the edge-adjacency model, we now 
exhibit a projection and a smooth distance estimator for the vertex-adjacency model. 
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Claim 17. In the vertex-adjacency model, there exists a projection ^ : Q ^ 'H2k cLnd a A-smooth 
distance estimator d, both of which are efficiently computable. 

To construct /U and d we start with the hnear program that determines a "fractional distance" 
from a graph to Hk- This LP has n + (2) variables: x^ which intuitively represents whether Xu 
ought to be removed from the graph or not, and Wu,v which represents whether the edge between 
u and V remains in the projected graph or not. We also use the notation au,v, where auv = 1 if the 
edge {u,v} is in G; otherwise Ouv = 0. 

min > Xy s.t. (1) Vv, x^ > (2) Vn, -y, Wuv^O 

(3) Wu, V, > > a-uv - Xu - x^ (4) Vn, Wu,v < k 



To convert our fractional solution (x*,n;*) to a graph ^{G) G ?^2fc we define ^{G) to be the 
graph we get by removing every edge (n,n) G E[G) whose either endpoint has weight x* > 1/4 or 
x*^ > 1/4. We define our distance estimator as d{G) = 4^^x*. In the appendix we show that ^ 
and d satisfy the conditions of claim [TTl 

As before, combining Lemma [16] with Claim [T7] gives the following theorem as an immediate 
corollary. 

Theorem 18. (Privacy wrt Vertex Adjacency) Given any query for social networks f , the mech- 
anism that uses the projection fi from Claim [73l and the (3 -smooth upper bound of Lemma \16\ and 
answers the query using A{f,G) = f{f-i{G)) + Lap{2 ■ "S^^ „^/2in<5(G')/e) preserves (e,(5) privacy for 
any graph G. 

Again, it is evident from the definition that the algorithm has the guarantee that for every 
GeUk-ii holds that Pr[ \A{f,G) - f{G)\ < 0{g{^^^)RSf{'H2k)/e)] > 2/3. 

5 Restricted Sensitivity and Tik 

Now that we have constructed the machinery of restricted sensitivity, we compare the restricted 
sensitivity over Tik with smooth sensitivity for specific types of queries, in order to demonstrate 
the benefits of our approach. In a nutshell, restricted sensitivity offers a significant advantage over 
smooth sensitivity whenever k <^n. I.e., we show that there are queries / s.t. for some G G T-Lk it 
holds that RSf {Tik) <. Sf^p (G). 

We now define two types of queries. First, let us introduce some notation. A profile is a function 
that maps a vertex t; in a social network {G,tj to [0, 1]. Given a set of vertices {fi,f2, ■ ■ ■ jft}, we 
denote by G[vi, ^2, . . . , nt] the social network derived by restricting G and ^ to these t vertices. We 
use Gy = G[{v} U {w I {v, vu) £ E {G)}] to denote the social network derived by restricting G and £ 
to V and its neighbors. A local profile satisfies the constraint p {v, (G, i)) = p {v, Gy). 

Definition 19. A (local) profile query fp{G,£) = YlveV{G) P{^^ i^^^)) •^^"^•s the (local) profile p 
accross all nodes. 

Local profile queries are a natural extension of predicates to social networks, which can be 
used to study many interesting properties of a social network like clustering coefficients [^ [T7l[22] . 
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local bridges |10pi2| and 2-betweeness [llj . Further dissussion can be found in section [C] in the 
appendix. Claim [20] bounds the restricted sensitivity of a local profile query over Tik (e.g., in the 
vertex adjacency model a node v can at worst affect the local profiles of itself, its k old neighbors 
and its k new neighbors) . A formal proof of Claim [20] is deferred to the appendix. 

Claim 20. For any local profile query f, we have that RSf {T-Lk) < 2A: + 1 in the vertex adjacency 
model and RSf {T-Lk) <k + \ in the edge adjacency model. 

By contrast the smooth sensitivity of a local profile query may be as large as 0{n) even for 
graphs in T-Lk- Consider the local profile query "how many people are friends with a spy?" The 
n — 1-star graph Gi in which a spy v is friends with everyone is adjacent to the empty graph 
Go € Tik- Therefore, any smooth upper bound Sf^p must have Sf^p{G) >n — 1. 

Subgraph queries allows us to ask questions such as "how many triplets of people are all friends 
when two of them are doctors and the other is a pop-singer?" or "how many paths of length 2 are 
there connecting a spy and a pop-singer over 40?" The average clustering coefficient of a graph can 
be computed from the number of triangles and 2-stars in a graph. 

Definition 21. A subgraph counting query f = {H,p) is given by a connected graph H over t 
vertices and t predicates pi,P2, ■ ■ ■ ,Pt- Given a social network {G,£), the answer to f {G,£) is the 
size of the set {fi, f2, ■ ■ ■ ,vt : G[vi,V2, ■ ■ ■ ,vt\ = H and Vi, £ {vi) S Pi} . 

The smooth sensitivity of a subgraph counting query may be as high as O {n^~^) in the vertex 
adjacency model. Let / = {H,p) be a subgraph counting query where H is a t-star and each 
predicate pi is identically true. Let Gi be a n-star (/ (Gi) = {^^i})- Then in the vertex adjacency 
model there is a neighboring graph G2 with no edges (/ (G2) = 0). We have that LSj (G2) > it-i)- 
Observe that G2 € Tik- In the appendix we show that the smooth sensitivity of / = {K3,p) is 
always greater than n when each predicate pi is identically true (see claim [25]) . By contrast Claim 
[22] bounds the restricted sensitivity of subgraph counting queries. The proof is deferred to the 
appendix. 

Claim 22. Let f = {H,p) be subgraph counting query and let t = \H\ then RSf (Tik) < tk^~^ in 
the edge adjacency model and in the vertex adjacency model. 

6 Future Questions/Directions 

Efficient Mappings: While we can show that there doesn't exist an efficiently computable 0(1)- 
smooth projection /U : ^ — )■ Tik, we don't know whether the construction of Claim [17] can be 
improved. Meaning, there could be a mapping fi : G ^ Ti for some Ti D Tik, whether the solution 
itself, the set of vertices that dominate the removed edges, is smooth. In other words. Is there an 
efficiently computable mapping fi : Q ^ H C Tik which satisfies \d{fi (Gi) , Gi) — d{fi (G2) , G2)| < 
c for some constant c? Multiple Queries: We primarily focus on improving the accuracy of 
a single query /. Could the notion of restricted sensitivity be used in conjunction with other 
mechanisms (e.g., BLR [6], Private Multiplicative Weights mechanism [13], etc.) to accurately 
answer an entire class of queries? Alternate Hypotheses: We focused on the specific hypothesis 
Tik. What other natural hypthothesis could be used to restrict sensitivity in private data analysis? 
Given such a hypothesis Ti can we efficiently construct a query Jh with low global sensitivity or 
with low smooth sensitivity over datasets D £ Ti? 
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A Missing Proofs 



Reminder of Claim 1131 In the edge- adjacency model, there exists an efficient way to compute 
a 3-smooth projection fi to Hk- 

Proof of Claim[T3[ We construct our smooth-projection n by first fixing a canonical ordering over 
all possible edges. Let e^,...,ej denote the edges incident to v in canonical order. For each edge 
e = {u, v} we delete e if and only if (i) e = for j > k or (ii) 6 = 6" for j > k (Intuitively for each 
V with deg (v) > k we keep this first k edges incident to v and fiag the other edges for deletion). If 
G G Til: then no edges are deleted, so fi{G) = G. Suppose that Gi,G2 are neighbors differing on 
one edge e = {x,y} (wlog, say that e is in Gi). Observe that for every v ^ x,y, the same set of 
edges incident to v will be deleted from both Gi and G2. In fact, if /u(Gi) does not contain e then 
^(Gi) = /i(G2). Otherwise, if e is not deleted we may assume then there may be at most one edge 
Cx (incident to x) and at most one edge Cy (incident to y) that were deleted from /i(Gi) but not 
from fi{G2)- Hence, 

d(/x(Gi),/x(G2)) <3 . 

□ 

Reminder of Lemma 1161 Fix % Z) % and let fi : D ^ Ti be a projection ofH. Let d : P — )• M 

be an efficiently computable c- smooth distance estimator. Then for every query f , we can define 
the composition fu = fo^^ and define the function 

Sf^^p{D)= max {-^ (d - d{D)\) {2d + c + I) ■ RSf {%) 

d&,d>d(D) ^ ^ 

Then Sf^^p is an efficiently computable /^-smooth upper bound on the local sensitivity of f-^. Fur- 

''2^6"^+^'', 0<x<4t 



thermore, define g as the function g{x) = \ ^ ' ~ ~ '^'^^ ■ Then for every D it holds 



c+1, X > ^ 
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that 

Sfj, (D) < eM^Jm ■ 9W/c)RSf{'H) 

Proof of Lemma\16l First, we show that indeed Sj^^p is an upper bound on the local sensitivity 
of f-}i. Fix any D £D and indeed 

LSf^ {D) = max \fn {D) - fn {D') \ = max - fifi{D'))\ 

JJ '^JJ JJ 

< maxRSfi'H)-d{f,{D),^,{D')) 

JJ' '^U 

< max RSfin) • {d{D, n{D)) + d{D, D') + d{D\ fi{D'))) 
<RSf{n)- {d{D) + 1 + max d{D')) <RSf{n)- {2d{D) + c+l) 

< max e-'^<'^-'^^^^) (2d + c + 1) RSf iU) 

d>d{D) 

Next we prove that Sf^^p is /3-smooth. Let Di and D2 be two neighboring databases, and wlog 
assume d{D2) < d{Di). Then 

Sf^ADi) _ "^^^^>^-(i^i) (- c - d{Di))) {2d + c+ 1) RSf {%) 
^fnAD2) ,f.:yr^^-^(d-d{D2))) {2d + C + I) RS f (U) 



^d>d(D2) V c 

Let do be the value of d on which the maximum of numerator is obtained. Then 
SfnADi) _ exp(-f (do-d(I)i)))(2do + c+l)i?5^(?^) 
SfnAD2) - max,>,^(^^) exp ("f (d " d{D2))) {2d + c+ 1) RSf {U) 
^ exp (-f (do - d{Di))) {2do + c + 1) RSf {%) 



exp (-1 (do - d {D2))) {2do + c + 1) RSf {%) 
exp f-f (dp2) - dpi))) < exp(/3) 



where the last inequality uses the smoothness property, i.e. that d{D2) — d{Di) > — c. 
Finally, we wish to prove the global upper bound on Sf^^p, i.e., that for every D G V 



Sf^^fs (D) < exp(f d(Z))) • g {c//3) RSf (H) 

Kp(-f. 

Taking the derivative of h we have 



Fix D and define h{x) = exp ( — ^x)(2x + c+ 1), so that Sf^^i^ = exp {^d{D)^RS f{'H) ■ max/i(d). 



h'{x) = e-c'= (-2xf-/3-f + 2 

which means that h{x) is maximizes at xq = ^ — In the case that xq < (i.e. for /3/c > ^^) 
we can upper bound the function h{x) with h{0) = c + 1 for every x > 0. Otherwise, we have that 

h{x) < h{xo) for every x > 0, and indeed h{xo) = 2-|e C 2 = g[j3/c). 
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To conclude the proof, observe that computing Sf^^p (D) is just a simple optimization once 
d (D) is known, much like the derivation done above. So since d is efficiently computable, we have 
that Sf^^p is efficiently computable. □ 

Reminder of Claim I17L In the vertex- adjacency model, there exists a projection fi : Q ^ T~L2k 
and a 4-smooth distance estimator d, both of which are efficiently computable. 



Proof of Claim 11 . We first prove that ^ is a projection mapping every graph to a graph in '}i2k- 
Suppose that some v £ G has degree > 2k, then clearly x* < 1/4, for otherwise we would have 
removed all of the edges touching v. Observe that every edge we keep has w*„ > 1 — 1/4—1/4 = 1/2. 
Consequently, we can have at most 2k edges with Wu,v ^ ^ because of the constraint Wu,v 

< k. 

So there are at most 2k edges incident to in (G). 

Now, let us prove that d satisfies all of the requirements of a 4-smooth distance estimator. First, 
if G G Tik then the optimal solution of the LP is the all zero vector, so d{G) = for all graphs 
of max-degree < k. Secondly, observe that in the process of computing fi{G), every edge that is 
removed from G can be "charged" to a vertex v with x* > 1/4. If follows that 

d{G,fi{G))< 5] 1 < 2^ <4^x: =d{G). 

v:x*>l/A v:x*>l/A v 

Lastly, fix any neighboring Gi,G2 G G, and let v be the vertex whose edges differ in Gi and G2. 
Clearly, if x* is a solution for LP(Gi), then we set y„ = 1 for i = l...d and y„ = x* otherwise. Now 
y is a feasible (not necessarily optimal) solution to LP (G2). It is simple to infer that 

d{G2)-d{Gi) =d(G2)-4^x:<4^y„-4^<<4^|(y,-x:)|=4|y„-<| <4. 

u u u u 

□ 

Reminder of Claim 1201 For any local profile query f , we have that RSf iT-Lk) <2k + l in the 
vertex adjacency model and RSf {H-k) <k + l in the edge adjacency model. 
Proof of Claim [Wi. Consider a local profile query /p. 

(Label change) Let Gi,G2 S Ti be two graphs with the same exact edge set, but with labeling 
functions ^1,^2 that are different on a single vertex. Let v be the vertex whose label differs on Gi 
and Gi, and let A^^, denote the set of its (at most k) neighbors. Then for every u ^ {v} U Ny we 
havethatp(n,(Gi,£i)) =p(n,(G2,£2)). Hence, |/p (Gi) - /p (G2)| < |WuiV,| <k + l. 
(Vertex Adjacency) Let Gi,G2 G be any two neighboring labeled graphs such that Gi — v = 
G2 — V. Let A'^^ (resp. A''^) denote the neighborhood of v then for any y ^ A^^ U A'^^ we have that 
p{y,{Gi,h))=p{y,{G2j2)). Hence, |/p (Gi) - /p (G2)| < \N^ U U {v}\ < 2k + 1. 
(Edge Adjacency) Let Gi,G2 G be any two neighboring labeled graphs. Wlog, there is an edge 
e = {u,v} such that E{Gi) = E{G2) U {e}. In order to have a vertex y s.t. p {y, {Gi, ii)) / 
p {y, (G2, ^2)) we need that the edge e appears in graph we get by restricting the social network to 
set of y and its neighbors. It follows that the only vertices whose local profile can change are in the 
union {u,v}u{NunN^). Hence, |/ (Gi) - / (G2)| < \{u} U {v}\ + \Nu\ {v}\ <2 + k-l = k + l. 
□ 

Reminder of Claim [22l Let f = {H,p) be subgraph counting query and let t = \H\ then 

RSf {Tik) < tk'~^ 
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in the edge adjacency model and in the vertex adjacency model. 

Proof of Claim \2M (Sketch) Let Gi,G2 € "Hk be neighbors and let f be a vertex such that 
Gi — v = G2 — v, and let Ni denote the neighbors of v in Gj. Any copy of H which occurs in Gi but 
not in G2 must contain v. Because H is connected we can bound the number of Gi copies of H. We 
can start with v, and we pick one of the t vertices of H to be mapped to v. Denote this vertex as 
vq. Now, we proceed inductively. We pick a vertex v £ H, connected to the set {vo,vi, . . . ,Vi-i}. 
The vertex vt must be assigned to a vertex in G which is incident to some specific vertex of the 
i vertices that we already mapped. Because we have bounded degree, then there are at most k 
options from which to choose Vi. We obtain the following bound: 

t-i 

f{Gi)-f{G2)<tY[k = tk'-' . 

i=l 

□ 

B Additional Claims 

Claim 23. (Privacy wrt Vertex Adjacency) Unless P = NP there is no efficiently computable 
mapping /U : ^ — )• Hk such that 

1. VG e 'Hk,f^{G) = G. 

2. VG G g,d{G,^iiG)) < O (In {k)d{G,nk)). 

Proof. (Sketch) Our reduction is from the minimum set cover problem. It is NP-hard to approx- 
imate the minimum set cover problem to a factor better than O(logn) (2l[20]. Given a set cover 
instance with sets 5i, ...,5^ and universe U = {xi, we set 

m = \{j : Xi £ Sj\ , 

and k = n + 1. We construct our labeled graph G as follows: 

1. Add a node for each Si. 

2. Add a node for each Xj. 

3. Add the edge {xj, Si} if and only if xj £ Si. 

4. For each Xi, create k + 1 — rui fresh nodes yi, ...,yk~mi and add each edge {yj,Xi}. 

Intuitively each node Xj has k + 1 incident edges. By deleting all of the edges incident to the 
node Si we can fix all of the nodes x G Si. Hence, d{G, [Hk)) corresponds exactly to the size of 
the minimum set cover. 

□ 

Claim 24. (Privacy wrt Vertex Adjacency) There is an efficiently computable projection fi : Q ^ 
Uk such that for every G £ g it holds that d (G, niG)) < (In (2^^ + kd)) d (G, Uk), 
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Proof. (Sketch) We use a greedy algorithm to create /i. Define the potential of a graph G as follows 

^{G)= Yl ideg{v)-k). 

vSiG:deg{v)>k 

Our algorithm /i starts by guessing a value d for d{G,'Hk) and deleting any vertex with degree 
> k + d+1 (these vertices must be deleted because the degree will be at least k + 1 after deleting d 
other vertices). Then n repeatedly picks the vertex v with the highest potential and eliminates all 
incident edges, where the potential of a vertex v is cp (G) — (j){G — v). Let denote the potential 
after round i { (f)Q \s the potential after deleting vertices with degree > k + d+1). Observe that 
00 ^ 2(i^ + kd\id is correct because (i) there are d vertices we can delete to drop the potential to 
and (2) deleting a single vertex v decreases the potential by at most deg{v)+deg{v) — k < 2d+k. Also 
observe that in any round there always exists some vertex whose removal decreases the potential 
by at least (1 — l/d) so we have 

Once i > din {2d^ + dk + d) we have (f>i < 1. □ 

The reduction in Claim [241 might be used to produce a function /-^ (G) = /U (/ (G)) with low 
smooth-sensitivity over the nice graphs Tik. Unfortunately, we don't know of any efficient algorithm 
to compute the smooth upper bound for such fy^. 

Claim 25. Let f = (K^jp) be a subgraph counting query with predicates pi that are identically true. 
In the vertex adjacency model for any (3 smooth upper bound on the local sensitivity of f and any 
graph G we have 

S}p^fj (G) > exp (-2/3) (n - 2) . 

Proof. Let G be given. Pick vi,V2 G y{G) and let Gi be obtained from G by adding all possible 
edges incident to vi and let G2 be obtained from Gi by deleting all edges incident to V2. Finally, 
let G3 be obtained from G2 by adding all possible edges incident to V2. Now the local sensitivity 
of / at G2 is at least n — 2, 

LSf {G2) = ^, _^max \f {G2) - f (C) | > / (G3) - / (G2) > n - 2 . 

Plugging this lower bound into the definition of /3 smooth sensitivity we obtain the required result. 

(G) > e-^'^^^'^^^LSfP (G2) > 6-2/3 (n - 2) . □ 

C Local Profile Queries 

Local profile queries are a natural extension of predicates to social networks, which can be used 
to study many interesting properties of a social network like clustering coefficients, local bridges 
and 2-betweeness). The clustering coefficient c{v) [TTIES] of a node v (e.g., the probability that 
two randomly selected friends of v are friends with each other) has been used to identify teenage 
girls who are more likely to consider suicide [3]. One explanation, is that it becomes an inherent 
source of stress if a person has many friends who are not friends with each other [10]. Observe 
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that c{v) is a local profile query. An edge {v, w} is a local bridge if its endpoints have no friends 
in common. A local profile could score a vertex v based on the number local bridges incident to v. 
A marketing agency may be interested in identifying nodes that are incident to many local bridges 
because local bridges "provide their endpoints with access to parts of the network - and hence 
sources of information - that they would otherwise be far away from [TO]." For example, a 1995 
study showed that the best job leads often come from aquaintances rather than close friends |12j . 
2-betweeness (a variant of betweeness [11]) measures the centrality of a node. We say that the 
2-betweeness of a vertex v is the probability that the a randomly chosen shortest path between two 
randomly chosen neighbors of v x,y £ G^] goes through v. 
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